A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# Warning filter
import warnings
warnings.filterwarnings("ignore")
# data manipulation libraries for Python
import pandas as pd
import numpy as np
# data visualisation libraires for Python
import matplotlib.pyplot as plt
import seaborn as sns
# statistical libraries for Python
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# prediction libraries for Python (Train/Test + Tree)
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
plot_confusion_matrix,
make_scorer,
roc_auc_score,
roc_curve,
)
# disable display column & row limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 250)
# upload and create aq clean copy of the data
innrec = pd.read_csv('INNHotelsGroup.csv')
data = innrec.copy()
# view first five rows of data
data.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
# view last five rows of data
data.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
# I get the same random results every time
np.random.seed(1)
data.sample(n=50)
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30392 | INN30393 | 1 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 53 | 2018 | 9 | 11 | Online | 0 | 0 | 0 | 94.32 | 0 | Not_Canceled |
| 6685 | INN06686 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 63 | 2018 | 4 | 22 | Online | 0 | 0 | 0 | 105.30 | 1 | Canceled |
| 8369 | INN08370 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 55 | 2018 | 9 | 11 | Online | 0 | 0 | 0 | 106.24 | 0 | Not_Canceled |
| 2055 | INN02056 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 53 | 2017 | 12 | 29 | Online | 0 | 0 | 0 | 81.00 | 1 | Not_Canceled |
| 10969 | INN10970 | 1 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 245 | 2018 | 7 | 6 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 24881 | INN24882 | 2 | 0 | 3 | 7 | Meal Plan 1 | 0 | Room_Type 2 | 231 | 2018 | 8 | 1 | Online | 0 | 0 | 0 | 81.82 | 2 | Canceled |
| 28658 | INN28659 | 2 | 0 | 0 | 3 | Meal Plan 2 | 0 | Room_Type 1 | 71 | 2018 | 5 | 10 | Offline | 0 | 0 | 0 | 126.00 | 1 | Not_Canceled |
| 20853 | INN20854 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 66 | 2017 | 10 | 9 | Offline | 0 | 0 | 0 | 75.00 | 0 | Canceled |
| 8501 | INN08502 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 2 | 40 | 2018 | 1 | 14 | Online | 0 | 0 | 0 | 77.55 | 1 | Not_Canceled |
| 1942 | INN01943 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 63 | 2018 | 8 | 9 | Online | 0 | 0 | 0 | 144.90 | 2 | Not_Canceled |
| 15648 | INN15649 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 209 | 2018 | 7 | 2 | Online | 0 | 0 | 0 | 66.53 | 1 | Not_Canceled |
| 6116 | INN06117 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 9 | 2018 | 7 | 6 | Online | 0 | 0 | 0 | 139.00 | 1 | Not_Canceled |
| 7868 | INN07869 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 123 | 2018 | 5 | 22 | Online | 0 | 0 | 0 | 114.75 | 1 | Canceled |
| 24527 | INN24528 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 118 | 2018 | 6 | 28 | Online | 0 | 0 | 0 | 96.30 | 0 | Canceled |
| 24227 | INN24228 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 4 | 72 | 2018 | 10 | 9 | Online | 0 | 0 | 0 | 132.30 | 3 | Not_Canceled |
| 17216 | INN17217 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 11 | 23 | Online | 0 | 0 | 0 | 120.00 | 0 | Not_Canceled |
| 31124 | INN31125 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 7 | 2017 | 8 | 28 | Corporate | 1 | 1 | 2 | 65.00 | 0 | Not_Canceled |
| 9101 | INN09102 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 12 | 2018 | 10 | 2 | Online | 0 | 0 | 0 | 6.00 | 0 | Not_Canceled |
| 9474 | INN09475 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 63 | 2017 | 9 | 4 | Offline | 0 | 0 | 0 | 116.00 | 0 | Not_Canceled |
| 12782 | INN12783 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 18 | 2018 | 1 | 21 | Online | 0 | 0 | 0 | 77.00 | 1 | Not_Canceled |
| 28297 | INN28298 | 1 | 0 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 1 | 68 | 2018 | 8 | 29 | Online | 0 | 0 | 0 | 92.35 | 1 | Not_Canceled |
| 22021 | INN22022 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 42 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 72.00 | 0 | Not_Canceled |
| 26586 | INN26587 | 3 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 53 | 2018 | 3 | 20 | Online | 0 | 0 | 0 | 124.10 | 1 | Canceled |
| 16756 | INN16757 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 305 | 2018 | 11 | 4 | Offline | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 20927 | INN20928 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 106 | 2018 | 7 | 8 | Offline | 0 | 0 | 0 | 72.25 | 2 | Not_Canceled |
| 35753 | INN35754 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 128 | 2018 | 6 | 20 | Online | 0 | 0 | 0 | 120.00 | 0 | Canceled |
| 9359 | INN09360 | 1 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 7 | 2018 | 5 | 16 | Online | 0 | 0 | 0 | 97.00 | 1 | Not_Canceled |
| 21929 | INN21930 | 1 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 48 | 2018 | 8 | 24 | Online | 0 | 0 | 0 | 149.40 | 1 | Not_Canceled |
| 17501 | INN17502 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 27 | 2018 | 3 | 22 | Online | 0 | 0 | 0 | 129.00 | 0 | Canceled |
| 3355 | INN03356 | 2 | 1 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 173 | 2018 | 8 | 13 | Online | 0 | 0 | 0 | 114.75 | 1 | Canceled |
| 22183 | INN22184 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 9 | 2018 | 6 | 1 | Online | 0 | 0 | 0 | 97.02 | 1 | Not_Canceled |
| 7818 | INN07819 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 57 | 2018 | 12 | 1 | Online | 0 | 0 | 0 | 79.20 | 2 | Not_Canceled |
| 26360 | INN26361 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 245 | 2018 | 6 | 17 | Offline | 0 | 0 | 0 | 75.00 | 0 | Canceled |
| 15193 | INN15194 | 2 | 0 | 0 | 3 | Meal Plan 2 | 0 | Room_Type 1 | 36 | 2017 | 10 | 13 | Offline | 0 | 0 | 0 | 112.00 | 0 | Not_Canceled |
| 19873 | INN19874 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 38 | 2018 | 7 | 2 | Online | 0 | 0 | 0 | 107.10 | 1 | Not_Canceled |
| 8015 | INN08016 | 1 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 29 | 2018 | 12 | 17 | Online | 0 | 0 | 0 | 130.00 | 0 | Not_Canceled |
| 36151 | INN36152 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 26 | 2018 | 2 | 7 | Online | 0 | 0 | 0 | 64.64 | 0 | Not_Canceled |
| 21254 | INN21255 | 1 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 6 | 2018 | 5 | 23 | Online | 0 | 0 | 0 | 97.02 | 1 | Not_Canceled |
| 16881 | INN16882 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 258 | 2018 | 10 | 16 | Offline | 0 | 0 | 0 | 110.00 | 0 | Canceled |
| 9465 | INN09466 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 118 | 2018 | 10 | 29 | Online | 0 | 0 | 0 | 104.40 | 1 | Not_Canceled |
| 18785 | INN18786 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 188 | 2018 | 6 | 15 | Offline | 0 | 0 | 0 | 130.00 | 0 | Canceled |
| 28097 | INN28098 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 39 | 2018 | 3 | 14 | Offline | 0 | 0 | 0 | 85.00 | 0 | Not_Canceled |
| 29239 | INN29240 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 32 | 2017 | 11 | 20 | Offline | 0 | 0 | 0 | 73.00 | 0 | Not_Canceled |
| 664 | INN00665 | 2 | 1 | 1 | 1 | Meal Plan 1 | 1 | Room_Type 1 | 27 | 2018 | 8 | 8 | Online | 0 | 0 | 0 | 195.50 | 2 | Not_Canceled |
| 8285 | INN08286 | 2 | 0 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 240 | 2018 | 12 | 10 | Online | 0 | 0 | 0 | 67.50 | 2 | Canceled |
| 4598 | INN04599 | 2 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 127 | 2018 | 7 | 25 | Online | 0 | 0 | 0 | 94.50 | 1 | Not_Canceled |
| 21340 | INN21341 | 2 | 2 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 6 | 24 | 2018 | 4 | 14 | Online | 0 | 0 | 0 | 207.00 | 2 | Not_Canceled |
| 12098 | INN12099 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 262 | 2018 | 12 | 26 | Online | 0 | 0 | 0 | 73.95 | 0 | Canceled |
| 2904 | INN02905 | 2 | 0 | 0 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 41 | 2018 | 10 | 4 | Online | 0 | 0 | 0 | 139.50 | 2 | Not_Canceled |
| 21902 | INN21903 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 101 | 2018 | 5 | 5 | Online | 0 | 0 | 0 | 132.60 | 1 | Not_Canceled |
# view the size of the data set
data.shape
(36275, 19)
# view the data types of the data set
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
# check for duplicate values
data[data.duplicated()].count()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# checking for NUll values... data is complete!!!
data.isnull().sum()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# drop the 'Booking ID' columns from the data set.
data = data.drop('Booking_ID', axis=1)
# view what are the values in object data types
cat_columns = ['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
for i in cat_columns:
print(data[i].value_counts())
print("*" * 50)
Meal Plan 1 27835 Not Selected 5130 Meal Plan 2 3305 Meal Plan 3 5 Name: type_of_meal_plan, dtype: int64 ************************************************** Room_Type 1 28130 Room_Type 4 6057 Room_Type 6 966 Room_Type 2 692 Room_Type 5 265 Room_Type 7 158 Room_Type 3 7 Name: room_type_reserved, dtype: int64 ************************************************** Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: market_segment_type, dtype: int64 ************************************************** Not_Canceled 24390 Canceled 11885 Name: booking_status, dtype: int64 **************************************************
data.describe()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 |
| mean | 1.844962 | 0.105279 | 0.810724 | 2.204300 | 0.030986 | 85.232557 | 2017.820427 | 7.423653 | 15.596995 | 0.025637 | 0.023349 | 0.153411 | 103.423539 | 0.619655 |
| std | 0.518715 | 0.402648 | 0.870644 | 1.410905 | 0.173281 | 85.930817 | 0.383836 | 3.069894 | 8.740447 | 0.158053 | 0.368331 | 1.754171 | 35.089424 | 0.786236 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2017.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 17.000000 | 2018.000000 | 5.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 80.300000 | 0.000000 |
| 50% | 2.000000 | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 57.000000 | 2018.000000 | 8.000000 | 16.000000 | 0.000000 | 0.000000 | 0.000000 | 99.450000 | 0.000000 |
| 75% | 2.000000 | 0.000000 | 2.000000 | 3.000000 | 0.000000 | 126.000000 | 2018.000000 | 10.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 120.000000 | 1.000000 |
| max | 4.000000 | 10.000000 | 7.000000 | 17.000000 | 1.000000 | 443.000000 | 2018.000000 | 12.000000 | 31.000000 | 1.000000 | 13.000000 | 58.000000 | 540.000000 | 5.000000 |
data.describe(include = ['object'])
| type_of_meal_plan | room_type_reserved | market_segment_type | booking_status | |
|---|---|---|---|---|
| count | 36275 | 36275 | 36275 | 36275 |
| unique | 4 | 7 | 5 | 2 |
| top | Meal Plan 1 | Room_Type 1 | Online | Not_Canceled |
| freq | 27835 | 28130 | 23214 | 24390 |
Leading Questions:
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
def stacked_barplot(data, predictor, target, perc=False):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5,))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
labeled_barplot(data, 'arrival_month', perc=True, n=None)
labeled_barplot(data, 'market_segment_type', perc=True, n=None)
histogram_boxplot(data, 'avg_price_per_room')
# how many free rooms does the hotel give away?
data[data['avg_price_per_room']==0]
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | length_stay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2017 | 9 | 10 | Complementary | 0 | 0 | 0 | 0.0 | 1 | False | 1 |
| 145 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 13 | 2018 | 6 | 1 | Complementary | 1 | 3 | 5 | 0.0 | 1 | False | 2 |
| 209 | 1 | 0 | 0 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 2 | 27 | Complementary | 0 | 0 | 0 | 0.0 | 1 | False | 0 |
| 266 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2017 | 8 | 12 | Complementary | 1 | 0 | 1 | 0.0 | 1 | False | 2 |
| 267 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2017 | 8 | 23 | Complementary | 0 | 0 | 0 | 0.0 | 1 | False | 3 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 35983 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 6 | 7 | Complementary | 1 | 4 | 17 | 0.0 | 1 | False | 1 |
| 36080 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 7 | 0 | 2018 | 3 | 21 | Complementary | 1 | 3 | 15 | 0.0 | 1 | False | 2 |
| 36114 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 3 | 2 | Online | 0 | 0 | 0 | 0.0 | 0 | False | 1 |
| 36217 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 3 | 2017 | 8 | 9 | Online | 0 | 0 | 0 | 0.0 | 2 | False | 3 |
| 36250 | 1 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 6 | 2017 | 12 | 10 | Online | 0 | 0 | 0 | 0.0 | 0 | False | 2 |
545 rows × 19 columns
data.loc[data['avg_price_per_room']==0, 'market_segment_type'].value_counts()
Complementary 354 Online 191 Name: market_segment_type, dtype: int64
sns.boxplot(data=data, y='avg_price_per_room' , x='market_segment_type')
<AxesSubplot:xlabel='market_segment_type', ylabel='avg_price_per_room'>
labeled_barplot(data, 'booking_status', perc=True, n=None)
stacked_barplot(data,'repeated_guest','booking_status')
booking_status Canceled Not_Canceled All repeated_guest All 11885 24390 36275 0 11869 23476 35345 1 16 914 930 ------------------------------------------------------------------------------------------------------------------------
sns.catplot(data=data, y='no_of_special_requests', hue='booking_status', kind='count' )
<seaborn.axisgrid.FacetGrid at 0x29a7c7b6370>
Leading Questions Answered:
What are the busiest months in the hotel?
Month 10 = October with 14.7% of the total booking for the year.
Which market segment do most of the guests come from?
Online 23214 or 64% of the bookings come via the internet.
Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
Online booking are the highest despite also having the highest amount of free rooms (I suppose they are redeemed from online retailers points systems) Aviation, Offline, and Corporate are generally slightly lower priced with Corporate edging out for the lowest. Complimentary are of course free.
What percentage of bookings are canceled?
about 1/3 (11885) of bookings are canceled in the sample data.
Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
Repeating guest rarely cancel (1.75%).
Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
The absence of special request increases the likelihood of cancellation, the addition of special request begins to reduce the likelihood of cancellation at one and progressively reduces cancellation to Zero on the instance of a third request.
Univariate
labeled_barplot(data, 'no_of_adults', perc=True)
labeled_barplot(data, 'no_of_children', perc=True)
labeled_barplot(data, 'no_of_weekend_nights', perc=True)
labeled_barplot(data, 'no_of_week_nights', perc=True)
labeled_barplot(data, 'required_car_parking_space', perc=True)
labeled_barplot(data, 'room_type_reserved', perc=True)
labeled_barplot(data, 'type_of_meal_plan', perc=True)
labeled_barplot(data, 'lead_time', perc=True)
histogram_boxplot(data, 'no_of_previous_cancellations')
histogram_boxplot(data, 'no_of_previous_bookings_not_canceled')
plt.figure(figsize=(20,10))
sns.heatmap(
data.corr(), annot=True, vmin=-1, vmax=1, fmt='.2f')
<AxesSubplot:>
# how does lead time effect cancellation
sns.catplot(data=data, x='lead_time', hue='booking_status', kind='count' )
plt.xticks(rotation=45)
plt.show()
# do weekends v weekday have differnt patterns on cancelation?
sns.catplot(data=data, y='no_of_week_nights', hue='booking_status', kind='count' )
plt.show()
sns.catplot(data=data, y='no_of_weekend_nights', hue='booking_status', kind='count' )
plt.show()
# having established that some months are busier than others, does priceing for that demand exist in the data?
plt.figure(figsize=(10, 5))
sns.lineplot(data=data, x='arrival_month', y='avg_price_per_room')
plt.show()
# New column for lenght of stay
data['length_stay'] = data['no_of_weekend_nights'] + data['no_of_week_nights']
sns.pairplot(data[['no_of_weekend_nights','no_of_week_nights','required_car_parking_space',
'lead_time','avg_price_per_room','no_of_special_requests','type_of_meal_plan',
'room_type_reserved','market_segment_type','booking_status','length_stay']]);
data.loc[data['booking_status']=='Not_Canceled','booking_status'] = False
data.loc[data['booking_status']=='Canceled','booking_status'] = True
numeric_columns = data.select_dtypes(include=np.number).columns.tolist()
# drop column because they were either time, or not helpful
numeric_columns.remove("arrival_year")
plt.figure(figsize=(15, 12))
for i, variable in enumerate(numeric_columns):
data.boxplot()
plt.xticks(rotation=45)
plt.show()
There are two heavy outlier columns, lead_time & avg_room_price. I will only treat avg_room_price as a log because I am going to bin lead time and that should handle those outliers.
#Solving the IQR fro avg price room
quartiles = np.quantile(data['avg_price_per_room'][data['avg_price_per_room'].notnull()], [.25, .75])
power_4iqr = 4 * (quartiles[1] - quartiles[0])
print(f'Q1 = {quartiles[0]}, Q3 = {quartiles[1]}, 4*IQR = {power_4iqr}')
outlier_powers = data.loc[np.abs(data['avg_price_per_room'] - data['avg_price_per_room'].median()) > power_4iqr, 'avg_price_per_room']
outlier_powers.shape
Q1 = 80.3, Q3 = 120.0, 4*IQR = 158.8
(49,)
# creating a list of columns
dist_cols = [
item for item in data.select_dtypes(include=np.number).columns
]
plt.figure(figsize=(15, 45))
#looping the list and ploting histograms
for i in range(len(dist_cols)):
plt.subplot(12, 3, i + 1)
plt.hist(data[dist_cols[i]], bins=50)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=15)
plt.show()
data2 = data.copy()
# removing because they are close to normal
dist_cols.remove('no_of_week_nights')
dist_cols.remove('no_of_adults')
dist_cols.remove('length_stay')
dist_cols.remove('avg_price_per_room')
# removing becasue they are boolean or time related.
dist_cols.remove('arrival_year')
dist_cols.remove('required_car_parking_space')
dist_cols.remove('arrival_date')
dist_cols.remove('arrival_month')
dist_cols.remove('repeated_guest')
# removing becasue I have a different treatment in mind
dist_cols.remove('lead_time')
# using log transforms on some columns
for col in dist_cols:
data2[col + "_log"] = np.log(data2[col] + 1)
# dropping the original columns
data2.drop(dist_cols, axis=1, inplace=True)
data2.head()
| no_of_adults | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 65.00 | False | 3 | 0.0 | 0.693147 | 0.0 | 0.0 | 0.000000 |
| 1 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 106.68 | False | 5 | 0.0 | 1.098612 | 0.0 | 0.0 | 0.693147 |
| 2 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 60.00 | True | 3 | 0.0 | 1.098612 | 0.0 | 0.0 | 0.000000 |
| 3 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 100.00 | True | 2 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 |
| 4 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 94.50 | True | 2 | 0.0 | 0.693147 | 0.0 | 0.0 | 0.000000 |
# viewing the distributions after the log transformation.
dist_cols = [
item for item in data2.select_dtypes(include=np.number).columns
]
# plot histogram of all numeric columns
plt.figure(figsize=(15, 45))
for i in range(len(dist_cols)):
plt.subplot(12, 3, i + 1)
plt.hist(data2[dist_cols[i]], bins=50)
sns.histplot(data=data2, x=dist_cols[i], kde=True)
plt.tight_layout()
plt.title(dist_cols[i], fontsize=25)
plt.show()
# OneHotEncoding catergorical variables
dummy_data = pd.get_dummies (
data2,
columns = [
'type_of_meal_plan',
'room_type_reserved',
'market_segment_type',
],
drop_first=True,
)
dummy_data.head()
| no_of_adults | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | avg_price_per_room | booking_status | length_stay | no_of_children_log | no_of_weekend_nights_log | no_of_previous_cancellations_log | no_of_previous_bookings_not_canceled_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 65.00 | False | 3 | 0.0 | 0.693147 | 0.0 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 106.68 | False | 5 | 0.0 | 1.098612 | 0.0 | 0.0 | 0.693147 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 60.00 | True | 3 | 0.0 | 1.098612 | 0.0 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 100.00 | True | 2 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2 | 1 | 0 | 48 | 2018 | 4 | 11 | 0 | 94.50 | True | 2 | 0.0 | 0.693147 | 0.0 | 0.0 | 0.000000 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
dummy_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_week_nights 36275 non-null int64 2 required_car_parking_space 36275 non-null int64 3 lead_time 36275 non-null int64 4 arrival_year 36275 non-null int64 5 arrival_month 36275 non-null int64 6 arrival_date 36275 non-null int64 7 repeated_guest 36275 non-null int64 8 avg_price_per_room 36275 non-null float64 9 booking_status 36275 non-null object 10 length_stay 36275 non-null int64 11 no_of_children_log 36275 non-null float64 12 no_of_weekend_nights_log 36275 non-null float64 13 no_of_previous_cancellations_log 36275 non-null float64 14 no_of_previous_bookings_not_canceled_log 36275 non-null float64 15 no_of_special_requests_log 36275 non-null float64 16 type_of_meal_plan_Meal Plan 2 36275 non-null uint8 17 type_of_meal_plan_Meal Plan 3 36275 non-null uint8 18 type_of_meal_plan_Not Selected 36275 non-null uint8 19 room_type_reserved_Room_Type 2 36275 non-null uint8 20 room_type_reserved_Room_Type 3 36275 non-null uint8 21 room_type_reserved_Room_Type 4 36275 non-null uint8 22 room_type_reserved_Room_Type 5 36275 non-null uint8 23 room_type_reserved_Room_Type 6 36275 non-null uint8 24 room_type_reserved_Room_Type 7 36275 non-null uint8 25 market_segment_type_Complementary 36275 non-null uint8 26 market_segment_type_Corporate 36275 non-null uint8 27 market_segment_type_Offline 36275 non-null uint8 28 market_segment_type_Online 36275 non-null uint8 dtypes: float64(6), int64(9), object(1), uint8(13) memory usage: 4.9+ MB
dummied_cut = pd.cut(dummy_data['lead_time'], 5, labels=['lat_min','short','med','long','advanced'])
dummied_cut.head(10)
0 med 1 lat_min 2 lat_min 3 med 4 lat_min 5 long 6 lat_min 7 lat_min 8 short 9 lat_min Name: lead_time, dtype: category Categories (5, object): ['lat_min' < 'short' < 'med' < 'long' < 'advanced']
data3 = pd.merge(dummy_data, dummied_cut, left_index=True, right_index=True)
data3.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| lead_time_x | 224 | 5 | 1 | 211 | 48 |
| arrival_year | 2017 | 2018 | 2018 | 2018 | 2018 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| arrival_date | 2 | 6 | 28 | 20 | 11 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.0 | 106.68 | 60.0 | 100.0 | 94.5 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_weekend_nights_log | 0.693147 | 1.098612 | 1.098612 | 0.0 | 0.693147 |
| no_of_previous_cancellations_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_previous_bookings_not_canceled_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_special_requests_log | 0.0 | 0.693147 | 0.0 | 0.0 | 0.0 |
| type_of_meal_plan_Meal Plan 2 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Meal Plan 3 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Not Selected | 0 | 1 | 0 | 0 | 1 |
| room_type_reserved_Room_Type 2 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 3 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 4 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 5 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 6 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 7 | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Complementary | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Corporate | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Offline | 1 | 0 | 0 | 0 | 0 |
| market_segment_type_Online | 0 | 1 | 1 | 1 | 1 |
| lead_time_y | med | lat_min | lat_min | med | lat_min |
# dropping time variables and lead_time_x since it has been binned into 5 columns.
data3_5 = data3.drop(['lead_time_x','arrival_date', 'arrival_year'], axis=1)
data4 = pd.get_dummies (
data3_5,
columns = [
'lead_time_y',
],
drop_first=True,
)
data4.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| no_of_adults | 2 | 2 | 1 | 2 | 2 |
| no_of_week_nights | 2 | 3 | 1 | 2 | 1 |
| required_car_parking_space | 0 | 0 | 0 | 0 | 0 |
| arrival_month | 10 | 11 | 2 | 5 | 4 |
| repeated_guest | 0 | 0 | 0 | 0 | 0 |
| avg_price_per_room | 65.0 | 106.68 | 60.0 | 100.0 | 94.5 |
| booking_status | False | False | True | True | True |
| length_stay | 3 | 5 | 3 | 2 | 2 |
| no_of_children_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_weekend_nights_log | 0.693147 | 1.098612 | 1.098612 | 0.0 | 0.693147 |
| no_of_previous_cancellations_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_previous_bookings_not_canceled_log | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| no_of_special_requests_log | 0.0 | 0.693147 | 0.0 | 0.0 | 0.0 |
| type_of_meal_plan_Meal Plan 2 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Meal Plan 3 | 0 | 0 | 0 | 0 | 0 |
| type_of_meal_plan_Not Selected | 0 | 1 | 0 | 0 | 1 |
| room_type_reserved_Room_Type 2 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 3 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 4 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 5 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 6 | 0 | 0 | 0 | 0 | 0 |
| room_type_reserved_Room_Type 7 | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Complementary | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Corporate | 0 | 0 | 0 | 0 | 0 |
| market_segment_type_Offline | 1 | 0 | 0 | 0 | 0 |
| market_segment_type_Online | 0 | 1 | 1 | 1 | 1 |
| lead_time_y_short | 0 | 0 | 0 | 0 | 0 |
| lead_time_y_med | 1 | 0 | 0 | 1 | 0 |
| lead_time_y_long | 0 | 0 | 0 | 0 | 0 |
| lead_time_y_advanced | 0 | 0 | 0 | 0 | 0 |
data4 = data4.astype(float)
data4.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null float64 1 no_of_week_nights 36275 non-null float64 2 required_car_parking_space 36275 non-null float64 3 arrival_month 36275 non-null float64 4 repeated_guest 36275 non-null float64 5 avg_price_per_room 36275 non-null float64 6 booking_status 36275 non-null float64 7 length_stay 36275 non-null float64 8 no_of_children_log 36275 non-null float64 9 no_of_weekend_nights_log 36275 non-null float64 10 no_of_previous_cancellations_log 36275 non-null float64 11 no_of_previous_bookings_not_canceled_log 36275 non-null float64 12 no_of_special_requests_log 36275 non-null float64 13 type_of_meal_plan_Meal Plan 2 36275 non-null float64 14 type_of_meal_plan_Meal Plan 3 36275 non-null float64 15 type_of_meal_plan_Not Selected 36275 non-null float64 16 room_type_reserved_Room_Type 2 36275 non-null float64 17 room_type_reserved_Room_Type 3 36275 non-null float64 18 room_type_reserved_Room_Type 4 36275 non-null float64 19 room_type_reserved_Room_Type 5 36275 non-null float64 20 room_type_reserved_Room_Type 6 36275 non-null float64 21 room_type_reserved_Room_Type 7 36275 non-null float64 22 market_segment_type_Complementary 36275 non-null float64 23 market_segment_type_Corporate 36275 non-null float64 24 market_segment_type_Offline 36275 non-null float64 25 market_segment_type_Online 36275 non-null float64 26 lead_time_y_short 36275 non-null float64 27 lead_time_y_med 36275 non-null float64 28 lead_time_y_long 36275 non-null float64 29 lead_time_y_advanced 36275 non-null float64 dtypes: float64(30) memory usage: 8.3 MB
# Using the SCIEM method I will split the train test data first.
X = data4.drop("booking_status" , axis=1)
y = data4.pop("booking_status")
# adding a contstant to X variable
X = add_constant(X)
# Train/Test Split 70/30
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0.0 0.670644 1.0 0.329356 Name: booking_status, dtype: float64 Percentage of classes in test set: 0.0 0.676376 1.0 0.323624 Name: booking_status, dtype: float64
X_train.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 25392 entries, 13662 to 33003 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 25392 non-null float64 1 no_of_adults 25392 non-null float64 2 no_of_week_nights 25392 non-null float64 3 required_car_parking_space 25392 non-null float64 4 arrival_month 25392 non-null float64 5 repeated_guest 25392 non-null float64 6 avg_price_per_room 25392 non-null float64 7 length_stay 25392 non-null float64 8 no_of_children_log 25392 non-null float64 9 no_of_weekend_nights_log 25392 non-null float64 10 no_of_previous_cancellations_log 25392 non-null float64 11 no_of_previous_bookings_not_canceled_log 25392 non-null float64 12 no_of_special_requests_log 25392 non-null float64 13 type_of_meal_plan_Meal Plan 2 25392 non-null float64 14 type_of_meal_plan_Meal Plan 3 25392 non-null float64 15 type_of_meal_plan_Not Selected 25392 non-null float64 16 room_type_reserved_Room_Type 2 25392 non-null float64 17 room_type_reserved_Room_Type 3 25392 non-null float64 18 room_type_reserved_Room_Type 4 25392 non-null float64 19 room_type_reserved_Room_Type 5 25392 non-null float64 20 room_type_reserved_Room_Type 6 25392 non-null float64 21 room_type_reserved_Room_Type 7 25392 non-null float64 22 market_segment_type_Complementary 25392 non-null float64 23 market_segment_type_Corporate 25392 non-null float64 24 market_segment_type_Offline 25392 non-null float64 25 market_segment_type_Online 25392 non-null float64 26 lead_time_y_short 25392 non-null float64 27 lead_time_y_med 25392 non-null float64 28 lead_time_y_long 25392 non-null float64 29 lead_time_y_advanced 25392 non-null float64 dtypes: float64(30) memory usage: 6.0 MB
plt.figure(figsize=(20,10))
sns.heatmap(
data4.corr(), annot=True, vmin=-1, vmax=1, fmt='.2f')
<AxesSubplot:>
sns.pairplot(data4[['no_of_adults',
'required_car_parking_space',
'arrival_month',
'repeated_guest',
'avg_price_per_room',
'length_stay',
'no_of_children_log',
'no_of_previous_cancellations_log',
'no_of_previous_bookings_not_canceled_log',
'no_of_special_requests_log',
'market_segment_type_Complementary',
'market_segment_type_Corporate',
'market_segment_type_Offline',
'market_segment_type_Online',
'lead_time_y_short',
'lead_time_y_med',
'lead_time_y_long',
'lead_time_y_advanced']]);
# let's check the VIF of the predictors
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 326.141919 no_of_adults 1.346659 no_of_week_nights 100.277464 required_car_parking_space 1.041578 arrival_month 1.051511 repeated_guest 3.340040 avg_price_per_room 1.936037 length_stay 146.442538 no_of_children_log 1.866322 no_of_weekend_nights_log 34.428764 no_of_previous_cancellations_log 1.597137 no_of_previous_bookings_not_canceled_log 3.508907 no_of_special_requests_log 1.267959 type_of_meal_plan_Meal Plan 2 1.217525 type_of_meal_plan_Meal Plan 3 1.025316 type_of_meal_plan_Not Selected 1.236534 room_type_reserved_Room_Type 2 1.090666 room_type_reserved_Room_Type 3 1.003381 room_type_reserved_Room_Type 4 1.364652 room_type_reserved_Room_Type 5 1.028015 room_type_reserved_Room_Type 6 1.858575 room_type_reserved_Room_Type 7 1.111002 market_segment_type_Complementary 4.507245 market_segment_type_Corporate 16.930019 market_segment_type_Offline 64.016661 market_segment_type_Online 71.248287 lead_time_y_short 1.119269 lead_time_y_med 1.105320 lead_time_y_long 1.151626 lead_time_y_advanced 1.047117 dtype: float64
#dropping the number of weekend & week nights because I have combined them into one & market segements because they all have large multi values
X_train1 = X_train.drop(['no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary'],
axis=1)
logit = sm.Logit(y_train, X_train1.astype(float))
lg = logit.fit()
Optimization terminated successfully.
Current function value: 0.463427
Iterations 10
# print the logistic regression summary
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25368
Method: MLE Df Model: 23
Date: Fri, 14 Jan 2022 Pseudo R-squ.: 0.2687
Time: 18:37:31 Log-Likelihood: -11767.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
============================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------------------
const -3.6818 0.098 -37.650 0.000 -3.873 -3.490
no_of_adults 0.2321 0.035 6.614 0.000 0.163 0.301
required_car_parking_space -1.4537 0.135 -10.742 0.000 -1.719 -1.188
arrival_month -0.0668 0.006 -11.685 0.000 -0.078 -0.056
repeated_guest -2.6424 0.630 -4.193 0.000 -3.878 -1.407
avg_price_per_room 0.0229 0.001 33.788 0.000 0.022 0.024
length_stay 0.1088 0.009 11.946 0.000 0.091 0.127
no_of_children_log 0.5488 0.093 5.887 0.000 0.366 0.732
no_of_previous_cancellations_log 1.2323 0.490 2.515 0.012 0.272 2.193
no_of_previous_bookings_not_canceled_log -0.6731 0.477 -1.411 0.158 -1.608 0.262
no_of_special_requests_log -1.9180 0.044 -43.892 0.000 -2.004 -1.832
type_of_meal_plan_Meal Plan 2 -0.3480 0.056 -6.165 0.000 -0.459 -0.237
type_of_meal_plan_Meal Plan 3 1.7182 2.912 0.590 0.555 -3.989 7.425
type_of_meal_plan_Not Selected 0.8463 0.048 17.563 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1288 0.123 1.045 0.296 -0.113 0.370
room_type_reserved_Room_Type 3 -0.2278 1.194 -0.191 0.849 -2.567 2.111
room_type_reserved_Room_Type 4 0.0548 0.050 1.095 0.273 -0.043 0.153
room_type_reserved_Room_Type 5 -0.9272 0.196 -4.735 0.000 -1.311 -0.543
room_type_reserved_Room_Type 6 -1.0662 0.135 -7.903 0.000 -1.331 -0.802
room_type_reserved_Room_Type 7 -1.8078 0.286 -6.331 0.000 -2.368 -1.248
lead_time_y_short 1.3167 0.039 34.051 0.000 1.241 1.393
lead_time_y_med 2.8622 0.058 49.315 0.000 2.748 2.976
lead_time_y_long 3.0529 0.077 39.428 0.000 2.901 3.205
lead_time_y_advanced 4.5673 0.247 18.478 0.000 4.083 5.052
============================================================================================================
# let's check the VIF of the predictors
vif_series = pd.Series(
[variance_inflation_factor(X_train1.values, i) for i in range(X_train1.shape[1])],
index=X_train1.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 29.389432 no_of_adults 1.279036 required_car_parking_space 1.037343 arrival_month 1.045906 repeated_guest 3.216850 avg_price_per_room 1.583496 length_stay 1.076888 no_of_children_log 1.855492 no_of_previous_cancellations_log 1.576794 no_of_previous_bookings_not_canceled_log 3.445330 no_of_special_requests_log 1.133817 type_of_meal_plan_Meal Plan 2 1.134531 type_of_meal_plan_Meal Plan 3 1.018636 type_of_meal_plan_Not Selected 1.108819 room_type_reserved_Room_Type 2 1.079517 room_type_reserved_Room_Type 3 1.000877 room_type_reserved_Room_Type 4 1.317003 room_type_reserved_Room_Type 5 1.013508 room_type_reserved_Room_Type 6 1.833828 room_type_reserved_Room_Type 7 1.072268 lead_time_y_short 1.105340 lead_time_y_med 1.091932 lead_time_y_long 1.122973 lead_time_y_advanced 1.043737 dtype: float64
# test performance
pred_train = lg.predict(X_train1) > 0.5
pred_train = np.round(pred_train)
X_train2 = X_train1.drop(['room_type_reserved_Room_Type 3'], axis=1)
X_train2.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 25392 entries, 13662 to 33003 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 const 25392 non-null float64 1 no_of_adults 25392 non-null float64 2 required_car_parking_space 25392 non-null float64 3 arrival_month 25392 non-null float64 4 repeated_guest 25392 non-null float64 5 avg_price_per_room 25392 non-null float64 6 length_stay 25392 non-null float64 7 no_of_children_log 25392 non-null float64 8 no_of_previous_cancellations_log 25392 non-null float64 9 no_of_previous_bookings_not_canceled_log 25392 non-null float64 10 no_of_special_requests_log 25392 non-null float64 11 type_of_meal_plan_Meal Plan 2 25392 non-null float64 12 type_of_meal_plan_Meal Plan 3 25392 non-null float64 13 type_of_meal_plan_Not Selected 25392 non-null float64 14 room_type_reserved_Room_Type 2 25392 non-null float64 15 room_type_reserved_Room_Type 4 25392 non-null float64 16 room_type_reserved_Room_Type 5 25392 non-null float64 17 room_type_reserved_Room_Type 6 25392 non-null float64 18 room_type_reserved_Room_Type 7 25392 non-null float64 19 lead_time_y_short 25392 non-null float64 20 lead_time_y_med 25392 non-null float64 21 lead_time_y_long 25392 non-null float64 22 lead_time_y_advanced 25392 non-null float64 dtypes: float64(23) memory usage: 4.6 MB
logit = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463428
Iterations 10
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25369
Method: MLE Df Model: 22
Date: Fri, 14 Jan 2022 Pseudo R-squ.: 0.2687
Time: 18:37:34 Log-Likelihood: -11767.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
============================================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------------------------------
const -3.6818 0.098 -37.651 0.000 -3.873 -3.490
no_of_adults 0.2321 0.035 6.614 0.000 0.163 0.301
required_car_parking_space -1.4536 0.135 -10.742 0.000 -1.719 -1.188
arrival_month -0.0668 0.006 -11.687 0.000 -0.078 -0.056
repeated_guest -2.6423 0.630 -4.193 0.000 -3.878 -1.407
avg_price_per_room 0.0229 0.001 33.789 0.000 0.022 0.024
length_stay 0.1089 0.009 11.948 0.000 0.091 0.127
no_of_children_log 0.5489 0.093 5.887 0.000 0.366 0.732
no_of_previous_cancellations_log 1.2322 0.490 2.515 0.012 0.272 2.193
no_of_previous_bookings_not_canceled_log -0.6731 0.477 -1.411 0.158 -1.608 0.262
no_of_special_requests_log -1.9180 0.044 -43.892 0.000 -2.004 -1.832
type_of_meal_plan_Meal Plan 2 -0.3479 0.056 -6.163 0.000 -0.459 -0.237
type_of_meal_plan_Meal Plan 3 1.7183 2.912 0.590 0.555 -3.988 7.425
type_of_meal_plan_Not Selected 0.8463 0.048 17.564 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1289 0.123 1.046 0.296 -0.113 0.371
room_type_reserved_Room_Type 4 0.0549 0.050 1.097 0.273 -0.043 0.153
room_type_reserved_Room_Type 5 -0.9271 0.196 -4.735 0.000 -1.311 -0.543
room_type_reserved_Room_Type 6 -1.0662 0.135 -7.903 0.000 -1.331 -0.802
room_type_reserved_Room_Type 7 -1.8078 0.286 -6.331 0.000 -2.367 -1.248
lead_time_y_short 1.3166 0.039 34.051 0.000 1.241 1.392
lead_time_y_med 2.8622 0.058 49.315 0.000 2.748 2.976
lead_time_y_long 3.0529 0.077 39.429 0.000 2.901 3.205
lead_time_y_advanced 4.5673 0.247 18.478 0.000 4.083 5.052
============================================================================================================
X_train3 = X_train2.drop(['no_of_previous_bookings_not_canceled_log'], axis=1)
logit = sm.Logit(y_train, X_train3.astype(float))
lg3 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463479
Iterations 9
print(lg3.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25370
Method: MLE Df Model: 21
Date: Fri, 14 Jan 2022 Pseudo R-squ.: 0.2686
Time: 18:37:34 Log-Likelihood: -11769.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.6853 0.098 -37.686 0.000 -3.877 -3.494
no_of_adults 0.2328 0.035 6.633 0.000 0.164 0.302
required_car_parking_space -1.4531 0.135 -10.738 0.000 -1.718 -1.188
arrival_month -0.0667 0.006 -11.679 0.000 -0.078 -0.056
repeated_guest -2.9666 0.574 -5.169 0.000 -4.092 -1.842
avg_price_per_room 0.0229 0.001 33.812 0.000 0.022 0.024
length_stay 0.1089 0.009 11.954 0.000 0.091 0.127
no_of_children_log 0.5492 0.093 5.890 0.000 0.366 0.732
no_of_previous_cancellations_log 0.9583 0.399 2.401 0.016 0.176 1.741
no_of_special_requests_log -1.9193 0.044 -43.924 0.000 -2.005 -1.834
type_of_meal_plan_Meal Plan 2 -0.3491 0.056 -6.183 0.000 -0.460 -0.238
type_of_meal_plan_Meal Plan 3 1.7180 2.914 0.590 0.555 -3.992 7.429
type_of_meal_plan_Not Selected 0.8466 0.048 17.568 0.000 0.752 0.941
room_type_reserved_Room_Type 2 0.1290 0.123 1.046 0.296 -0.113 0.371
room_type_reserved_Room_Type 4 0.0543 0.050 1.085 0.278 -0.044 0.152
room_type_reserved_Room_Type 5 -0.9291 0.196 -4.746 0.000 -1.313 -0.545
room_type_reserved_Room_Type 6 -1.0676 0.135 -7.913 0.000 -1.332 -0.803
room_type_reserved_Room_Type 7 -1.8098 0.286 -6.337 0.000 -2.370 -1.250
lead_time_y_short 1.3170 0.039 34.060 0.000 1.241 1.393
lead_time_y_med 2.8638 0.058 49.342 0.000 2.750 2.978
lead_time_y_long 3.0543 0.077 39.438 0.000 2.903 3.206
lead_time_y_advanced 4.5896 0.248 18.494 0.000 4.103 5.076
====================================================================================================
# let's check the VIF of the predictors again to see if any Multicollinearity persist
vif_series = pd.Series(
[variance_inflation_factor(X_train3.values, i) for i in range(X_train3.shape[1])],
index=X_train3.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 29.239515 no_of_adults 1.276452 required_car_parking_space 1.036585 arrival_month 1.044675 repeated_guest 1.552463 avg_price_per_room 1.578995 length_stay 1.076638 no_of_children_log 1.855384 no_of_previous_cancellations_log 1.426094 no_of_special_requests_log 1.128277 type_of_meal_plan_Meal Plan 2 1.134291 type_of_meal_plan_Meal Plan 3 1.018576 type_of_meal_plan_Not Selected 1.108682 room_type_reserved_Room_Type 2 1.079477 room_type_reserved_Room_Type 4 1.316395 room_type_reserved_Room_Type 5 1.012609 room_type_reserved_Room_Type 6 1.832724 room_type_reserved_Room_Type 7 1.071514 lead_time_y_short 1.105261 lead_time_y_med 1.091919 lead_time_y_long 1.122907 lead_time_y_advanced 1.043512 dtype: float64
X_train4 = X_train3.drop(['room_type_reserved_Room_Type 2'], axis=1)
logit = sm.Logit(y_train, X_train4.astype(float))
lg4 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463500
Iterations 9
print(lg4.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25371
Method: MLE Df Model: 20
Date: Fri, 14 Jan 2022 Pseudo R-squ.: 0.2686
Time: 18:37:36 Log-Likelihood: -11769.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.6755 0.097 -37.773 0.000 -3.866 -3.485
no_of_adults 0.2313 0.035 6.592 0.000 0.162 0.300
required_car_parking_space -1.4496 0.135 -10.723 0.000 -1.715 -1.185
arrival_month -0.0669 0.006 -11.722 0.000 -0.078 -0.056
repeated_guest -2.9693 0.574 -5.173 0.000 -4.094 -1.844
avg_price_per_room 0.0229 0.001 33.816 0.000 0.022 0.024
length_stay 0.1090 0.009 11.964 0.000 0.091 0.127
no_of_children_log 0.5688 0.091 6.225 0.000 0.390 0.748
no_of_previous_cancellations_log 0.9574 0.399 2.398 0.016 0.175 1.740
no_of_special_requests_log -1.9178 0.044 -43.917 0.000 -2.003 -1.832
type_of_meal_plan_Meal Plan 2 -0.3510 0.056 -6.220 0.000 -0.462 -0.240
type_of_meal_plan_Meal Plan 3 1.7193 2.911 0.591 0.555 -3.987 7.425
type_of_meal_plan_Not Selected 0.8443 0.048 17.541 0.000 0.750 0.939
room_type_reserved_Room_Type 4 0.0528 0.050 1.056 0.291 -0.045 0.151
room_type_reserved_Room_Type 5 -0.9320 0.196 -4.761 0.000 -1.316 -0.548
room_type_reserved_Room_Type 6 -1.0857 0.134 -8.110 0.000 -1.348 -0.823
room_type_reserved_Room_Type 7 -1.8191 0.286 -6.369 0.000 -2.379 -1.259
lead_time_y_short 1.3172 0.039 34.067 0.000 1.241 1.393
lead_time_y_med 2.8676 0.058 49.483 0.000 2.754 2.981
lead_time_y_long 3.0549 0.077 39.444 0.000 2.903 3.207
lead_time_y_advanced 4.5912 0.248 18.498 0.000 4.105 5.078
====================================================================================================
X_train5 = X_train4.drop(['room_type_reserved_Room_Type 4'], axis=1)
logit = sm.Logit(y_train, X_train5.astype(float))
lg5 = logit.fit()
Optimization terminated successfully.
Current function value: 0.463522
Iterations 9
print(lg5.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25372
Method: MLE Df Model: 19
Date: Fri, 14 Jan 2022 Pseudo R-squ.: 0.2686
Time: 18:37:36 Log-Likelihood: -11770.
converged: True LL-Null: -16091.
Covariance Type: nonrobust LLR p-value: 0.000
====================================================================================================
coef std err z P>|z| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -3.7035 0.094 -39.515 0.000 -3.887 -3.520
no_of_adults 0.2398 0.034 7.023 0.000 0.173 0.307
required_car_parking_space -1.4500 0.135 -10.728 0.000 -1.715 -1.185
arrival_month -0.0672 0.006 -11.767 0.000 -0.078 -0.056
repeated_guest -2.9647 0.574 -5.163 0.000 -4.090 -1.839
avg_price_per_room 0.0231 0.001 35.959 0.000 0.022 0.024
length_stay 0.1100 0.009 12.154 0.000 0.092 0.128
no_of_children_log 0.5594 0.091 6.150 0.000 0.381 0.738
no_of_previous_cancellations_log 0.9559 0.400 2.392 0.017 0.173 1.739
no_of_special_requests_log -1.9157 0.044 -43.926 0.000 -2.001 -1.830
type_of_meal_plan_Meal Plan 2 -0.3601 0.056 -6.456 0.000 -0.469 -0.251
type_of_meal_plan_Meal Plan 3 1.7283 2.969 0.582 0.560 -4.090 7.547
type_of_meal_plan_Not Selected 0.8335 0.047 17.728 0.000 0.741 0.926
room_type_reserved_Room_Type 5 -0.9481 0.195 -4.858 0.000 -1.331 -0.566
room_type_reserved_Room_Type 6 -1.1095 0.132 -8.409 0.000 -1.368 -0.851
room_type_reserved_Room_Type 7 -1.8578 0.283 -6.556 0.000 -2.413 -1.302
lead_time_y_short 1.3157 0.039 34.057 0.000 1.240 1.391
lead_time_y_med 2.8630 0.058 49.547 0.000 2.750 2.976
lead_time_y_long 3.0495 0.077 39.450 0.000 2.898 3.201
lead_time_y_advanced 4.5867 0.248 18.479 0.000 4.100 5.073
====================================================================================================
# converting coefficients to odds
odds = np.exp(lg5.params)
# adding the odds to a dataframe
pd.DataFrame(odds, X_train5.columns, columns=["odds"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| odds | 0.024636 | 1.271005 | 0.234579 | 0.935032 | 0.051575 | 1.023344 | 1.116331 | 1.749625 | 2.60091 | 0.147237 | 0.697598 | 5.631328 | 2.301302 | 0.387484 | 0.329734 | 0.156016 | 3.727461 | 17.514826 | 21.105724 | 98.170565 |
# finding the percentage change
perc_change_odds = (np.exp(lg5.params) - 1) * 100
# adding the change_odds% to a dataframe
pd.DataFrame(perc_change_odds, X_train3.columns, columns=["change_odds%"]).T
| const | no_of_adults | required_car_parking_space | arrival_month | repeated_guest | avg_price_per_room | length_stay | no_of_children_log | no_of_previous_cancellations_log | no_of_special_requests_log | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | lead_time_y_short | lead_time_y_med | lead_time_y_long | lead_time_y_advanced | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| change_odds% | -97.53635 | 27.100515 | -76.542091 | -6.496819 | -94.842473 | 2.33444 | 11.6331 | 74.962538 | 160.091028 | -85.276282 | -30.240187 | 463.132781 | 130.13023 | NaN | NaN | -61.251587 | -67.026611 | -84.398434 | 272.74612 | 1651.48264 | 2010.57241 | 9717.0565 |
# fitting the model on training set
logit = sm.Logit(y_train, X_train5.astype(float))
lg3 = logit.fit()
pred_train4 = lg5.predict(X_train5)
pred_train4 = np.round(pred_train4)
Optimization terminated successfully.
Current function value: 0.463522
Iterations 9
# another confusion matrix
cm = confusion_matrix(y_train, pred_train4)
plt.figure(figsize=(7, 5))
sns.heatmap(cm, annot=True, fmt="g")
plt.xlabel("Predicted Values")
plt.ylabel("Actual Values")
plt.show()
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
Accuracy on training set : 0.7811515437933207
logit_roc_auc_train = roc_auc_score(y_train, lg5.predict(X_train5))
fpr, tpr, thresholds = roc_curve(y_train, lg5.predict(X_train5))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# dropping variables from test set as well which were dropped from training set
X_test1 = X_test.drop([ 'no_of_weekend_nights_log',
'no_of_week_nights',
'market_segment_type_Online',
'market_segment_type_Offline',
'market_segment_type_Corporate',
'market_segment_type_Complementary',
'room_type_reserved_Room_Type 3',
'room_type_reserved_Room_Type 4',
'no_of_previous_bookings_not_canceled_log',
'room_type_reserved_Room_Type 2'
], axis=1)
pred_test = lg5.predict(X_test1) > 0.5
pred_test = np.round(pred_test)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train4))
print("Accuracy on test set : ", accuracy_score(y_test, pred_test))
Accuracy on training set : 0.7811515437933207 Accuracy on test set : 0.7846182118901038
tree_data = dummy_data.astype(float)
tree_data = tree_data.drop(['arrival_date','arrival_year','no_of_week_nights',
'no_of_weekend_nights_log' ], axis=1)
X = tree_data.drop("booking_status" , axis=1)
y = tree_data.pop("booking_status")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1)
# building a decision tree using the dtclassifier function
dTree = DecisionTreeClassifier(criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
#scoring the accuracy on train & test data
print("Accuracy on training set : ",dTree.score(X_train, y_train))
print("Accuracy on test set : ",dTree.score(X_test, y_test))
Accuracy on training set : 0.9924385633270322 Accuracy on test set : 0.8585867867315997
# checking the positive outcomes
y.sum(axis = 0)
11885.0
## Function to create confusion matrix
def make_confusion_matrix(model,y_actual,labels=[1, 0]):
'''
model : classifier to predict values of X
y_actual : ground truth
'''
y_predict = model.predict(X_test)
cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
columns = [i for i in ['Predicted - No','Predicted - Yes']])
group_counts = ["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=labels,fmt='')
plt.ylabel('True label')
plt.xlabel('Predicted label')
## Function to calculate recall score
def get_recall_score(model):
'''
model : classifier to predict values of X
'''
pred_train = model.predict(X_train)
pred_test = model.predict(X_test)
print("Recall on training set : ",metrics.recall_score(y_train,pred_train))
print("Recall on test set : ",metrics.recall_score(y_test,pred_test))
# another confusion matrix
make_confusion_matrix(dTree,y_test)
# check the recall on the train and test.
get_recall_score(dTree)
Recall on training set : 0.9817051297381323 Recall on test set : 0.7921635434412265
the_features = list(X.columns)
print(the_features)
['no_of_adults', 'required_car_parking_space', 'lead_time', 'arrival_month', 'repeated_guest', 'avg_price_per_room', 'length_stay', 'no_of_children_log', 'no_of_previous_cancellations_log', 'no_of_previous_bookings_not_canceled_log', 'no_of_special_requests_log', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
plt.figure(figsize=(20,30))
tree.plot_tree(dTree,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dTree,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.35 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 5.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- lead_time <= 74.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 62.00 | | | | | | | | | | |--- avg_price_per_room <= 57.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 57.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 62.00 | | | | | | | | | | |--- avg_price_per_room <= 151.59 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 151.59 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- avg_price_per_room <= 138.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 138.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [169.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- avg_price_per_room <= 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | |--- lead_time > 74.50 | | | | | | | |--- lead_time <= 78.50 | | | | | | | | |--- avg_price_per_room <= 95.47 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 69.85 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.85 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [26.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 95.47 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 120.24 | | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 120.24 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 78.50 | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [110.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 86.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 86.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | |--- avg_price_per_room <= 66.75 | | | | | | | | | | |--- avg_price_per_room <= 63.25 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 63.25 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 66.75 | | | | | | | | | | |--- avg_price_per_room <= 73.53 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 73.53 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | |--- length_stay > 5.50 | | | | | |--- avg_price_per_room <= 115.50 | | | | | | |--- length_stay <= 14.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- length_stay <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- length_stay > 11.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- lead_time <= 75.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 75.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- avg_price_per_room <= 70.42 | | | | | | | | | |--- weights: [34.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 70.42 | | | | | | | | | |--- avg_price_per_room <= 71.42 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 71.42 | | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- length_stay > 14.50 | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | |--- avg_price_per_room > 115.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- length_stay <= 10.00 | | | | | | | | | |--- weights: [0.00, 43.00] class: 1.0 | | | | | | | | |--- length_stay > 10.00 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 61.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 61.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 80.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 80.50 | | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [50.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 86.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 86.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 112.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 112.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- length_stay <= 3.50 | | | | | | | | |--- avg_price_per_room <= 117.50 | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | |--- weights: [0.00, 59.00] class: 1.0 | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 117.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 3.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 101.12 | | | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 101.12 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- weights: [0.00, 47.00] class: 1.0 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | |--- avg_price_per_room <= 177.83 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- weights: [45.00, 0.00] class: 0.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 177.83 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | |--- avg_price_per_room <= 110.86 | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 110.86 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [141.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- avg_price_per_room <= 89.88 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | |--- lead_time <= 123.50 | | | | | | | | | | |--- avg_price_per_room <= 82.88 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 82.88 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 123.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | |--- lead_time <= 122.00 | | | | | | | | | | |--- avg_price_per_room <= 63.12 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 63.12 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- lead_time > 122.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [65.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 89.88 | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | |--- avg_price_per_room <= 94.75 | | | | | | | | | |--- lead_time <= 125.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 125.50 | | | | | | | | | | |--- lead_time <= 138.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 138.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 94.75 | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | | |--- avg_price_per_room <= 97.41 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 97.41 | | | | | | | | | | | |--- weights: [60.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 202.67 | | | | | |--- lead_time <= 3.50 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [67.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- length_stay <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- length_stay > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- length_stay <= 12.00 | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 76.35 | | | | | | | | | | |--- avg_price_per_room <= 74.40 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 74.40 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 76.35 | | | | | | | | | | |--- avg_price_per_room <= 118.04 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 118.04 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | |--- avg_price_per_room <= 178.00 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 178.00 | | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 12.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 3.50 | | | | | | |--- avg_price_per_room <= 99.38 | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | |--- length_stay <= 15.00 | | | | | | | | | |--- length_stay <= 7.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [84.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 7.50 | | | | | | | | | | |--- lead_time <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 7.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 15.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [42.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 99.38 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | | | |--- avg_price_per_room <= 117.25 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 117.25 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | |--- avg_price_per_room <= 160.17 | | | | | | | | | | | |--- weights: [41.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 160.17 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.67 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 32.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | |--- lead_time > 13.50 | | | | |--- avg_price_per_room <= 105.27 | | | | | |--- avg_price_per_room <= 60.07 | | | | | | |--- lead_time <= 84.50 | | | | | | | |--- lead_time <= 51.50 | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | |--- avg_price_per_room <= 21.67 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 21.67 | | | | | | | | | | |--- avg_price_per_room <= 49.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 49.84 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 51.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 84.50 | | | | | | | |--- lead_time <= 87.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- lead_time > 87.50 | | | | | | | | |--- length_stay <= 8.00 | | | | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- length_stay > 8.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- avg_price_per_room > 60.07 | | | | | | |--- lead_time <= 25.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [29.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- avg_price_per_room <= 69.16 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 69.16 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [54.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 25.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 71.92 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 71.92 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 35.00] class: 1.0 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 90.20 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 90.20 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 74.53 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 74.53 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 105.27 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_month <= 10.50 | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 38.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- lead_time > 38.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | |--- lead_time <= 135.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- lead_time > 135.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 92.00] class: 1.0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 10.50 | | | | | | | |--- lead_time <= 22.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 22.50 | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | |--- avg_price_per_room <= 147.75 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 147.75 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | |--- length_stay <= 8.50 | | | | | | | | | | |--- lead_time <= 80.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 80.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- length_stay > 8.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- length_stay <= 11.00 | | | | | | | |--- weights: [39.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 11.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | |--- no_of_special_requests_log > 0.35 | | |--- no_of_special_requests_log <= 0.90 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | |--- lead_time <= 102.50 | | | | | | |--- length_stay <= 15.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [848.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | | |--- weights: [43.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | |--- lead_time <= 35.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- lead_time > 35.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 15.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 102.50 | | | | | | |--- lead_time <= 104.50 | | | | | | | |--- lead_time <= 103.50 | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- lead_time > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | |--- lead_time > 104.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | |--- avg_price_per_room <= 81.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 81.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | |--- lead_time <= 63.00 | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0.0 | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | |--- length_stay <= 1.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | |--- length_stay > 1.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 63.00 | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- length_stay <= 14.00 | | | | | | | |--- avg_price_per_room <= 219.86 | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 219.86 | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | |--- avg_price_per_room <= 237.25 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 237.25 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- length_stay > 14.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- lead_time > 4.50 | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | |--- avg_price_per_room <= 123.60 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 88.76 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [32.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.76 | | | | | | | | | | |--- avg_price_per_room <= 91.22 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 91.22 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 123.60 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 124.05 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 124.05 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 128.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 128.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | |--- length_stay <= 3.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 3.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 127.62 | | | | | | | |--- lead_time <= 43.50 | | | | | | | | |--- length_stay <= 9.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [87.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [127.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 9.50 | | | | | | | | | |--- lead_time <= 29.50 | | | | | | | | | | |--- avg_price_per_room <= 76.22 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 76.22 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 29.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | |--- lead_time > 43.50 | | | | | | | | |--- length_stay <= 10.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 76.54 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- avg_price_per_room > 76.54 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- length_stay > 10.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | |--- avg_price_per_room > 127.62 | | | | | | | |--- lead_time <= 142.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 179.62 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- avg_price_per_room > 179.62 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- lead_time <= 139.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time > 139.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.50 | | | | | | | | | | | |--- weights: [49.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 100.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 142.50 | | | | | | | | |--- avg_price_per_room <= 142.65 | | | | | | | | | |--- arrival_month <= 10.00 | | | | | | | | | | |--- length_stay <= 3.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 10.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 142.65 | | | | | | | | | |--- avg_price_per_room <= 182.49 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 182.49 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | |--- weights: [180.00, 0.00] class: 0.0 | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- lead_time <= 90.50 | | | | |--- length_stay <= 12.00 | | | | | |--- length_stay <= 4.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- weights: [1689.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 90.05 | | | | | | | | | |--- lead_time <= 48.00 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [61.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 48.00 | | | | | | | | | | |--- avg_price_per_room <= 89.85 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 89.85 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 90.05 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- weights: [221.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time <= 28.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 28.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | |--- lead_time <= 31.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 31.00 | | | | | | | | | |--- avg_price_per_room <= 159.42 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 159.42 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | |--- length_stay > 4.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- length_stay <= 6.50 | | | | | | | | |--- avg_price_per_room <= 92.33 | | | | | | | | | |--- avg_price_per_room <= 90.95 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 90.95 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 92.33 | | | | | | | | | |--- lead_time <= 80.50 | | | | | | | | | | |--- lead_time <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 11.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 80.50 | | | | | | | | | | |--- lead_time <= 81.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 81.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 6.50 | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | |--- avg_price_per_room <= 83.24 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 83.24 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | |--- lead_time <= 72.50 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 72.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [69.00, 0.00] class: 0.0 | | | | |--- length_stay > 12.00 | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 202.95 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- avg_price_per_room <= 80.33 | | | | | | | | | |--- avg_price_per_room <= 76.37 | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 76.37 | | | | | | | | | | |--- lead_time <= 98.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 98.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- avg_price_per_room > 80.33 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 115.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- lead_time > 115.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- no_of_children_log <= 0.35 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- lead_time <= 142.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 142.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_children_log > 0.35 | | | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | | | |--- lead_time <= 105.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 105.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- avg_price_per_room <= 103.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 103.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 107.00 | | | | | | | | | | |--- avg_price_per_room <= 70.52 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 70.52 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 107.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | | | | |--- avg_price_per_room <= 92.60 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 92.60 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | |--- avg_price_per_room > 202.95 | | | | | |--- weights: [0.00, 7.00] class: 1.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.35 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- length_stay <= 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 2.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- weights: [0.00, 15.00] class: 1.0 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- avg_price_per_room <= 97.50 | | | | | | | | | |--- length_stay <= 3.00 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1.0 | | | | | | | | | |--- length_stay > 3.00 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 97.50 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- weights: [61.00, 6.00] class: 0.0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- lead_time <= 231.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 231.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- length_stay <= 5.50 | | | | | | | | |--- lead_time <= 402.00 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- lead_time <= 381.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 381.50 | | | | | | | | | | | |--- weights: [3.00, 2.00] class: 0.0 | | | | | | | | |--- lead_time > 402.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | |--- length_stay > 5.50 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 84.58 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- length_stay <= 2.50 | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | |--- lead_time <= 229.50 | | | | | | | | | | |--- avg_price_per_room <= 69.34 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- avg_price_per_room > 69.34 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 229.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 2.50 | | | | | | | | |--- avg_price_per_room <= 27.07 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | |--- avg_price_per_room > 27.07 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 75.83 | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 66.00 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 66.00 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | |--- length_stay <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- length_stay > 6.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 75.83 | | | | | | | | | |--- lead_time <= 292.50 | | | | | | | | | | |--- length_stay <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 292.50 | | | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 23.00] class: 1.0 | | | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [37.00, 0.00] class: 0.0 | | | | | |--- avg_price_per_room > 84.58 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | |--- lead_time <= 316.00 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 316.00 | | | | | | | | | |--- lead_time <= 338.00 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | | | |--- lead_time > 338.00 | | | | | | | | | | |--- weights: [1.00, 5.00] class: 1.0 | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [9.00, 0.00] class: 0.0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 2.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | |--- avg_price_per_room > 2.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- weights: [0.00, 525.00] class: 1.0 | | | | | |--- arrival_month > 11.50 | | | | | | |--- length_stay <= 3.50 | | | | | | | |--- lead_time <= 204.00 | | | | | | | | |--- weights: [0.00, 11.00] class: 1.0 | | | | | | | |--- lead_time > 204.00 | | | | | | | | |--- lead_time <= 214.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 214.50 | | | | | | | | | |--- lead_time <= 275.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1.0 | | | | | | | | | |--- lead_time > 275.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | |--- length_stay > 3.50 | | | | | | | |--- avg_price_per_room <= 80.51 | | | | | | | | |--- weights: [0.00, 41.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 80.51 | | | | | | | | |--- avg_price_per_room <= 81.43 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 81.43 | | | | | | | | | |--- weights: [0.00, 13.00] class: 1.0 | | |--- no_of_special_requests_log > 0.35 | | | |--- market_segment_type_Offline <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- lead_time <= 152.50 | | | | | | | | |--- avg_price_per_room <= 90.81 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 2.00] class: 1.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 90.81 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 152.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- length_stay <= 4.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0.0 | | | | | | | | | |--- length_stay > 4.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- avg_price_per_room <= 87.12 | | | | | | | | |--- lead_time <= 158.50 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1.0 | | | | | | | | |--- lead_time > 158.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 87.12 | | | | | | | | |--- avg_price_per_room <= 89.75 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 89.75 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- avg_price_per_room <= 93.44 | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | |--- lead_time <= 162.50 | | | | | | | | | | |--- lead_time <= 161.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0.0 | | | | | | | | | | |--- lead_time > 161.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- lead_time > 162.50 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- weights: [52.00, 0.00] class: 0.0 | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | |--- avg_price_per_room <= 88.38 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | | | |--- avg_price_per_room > 88.38 | | | | | | | | | | |--- avg_price_per_room <= 90.92 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 90.92 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0.0 | | | | | | | |--- avg_price_per_room > 93.44 | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | |--- avg_price_per_room <= 93.67 | | | | | | | | | | |--- length_stay <= 5.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 5.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 93.67 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | |--- lead_time <= 179.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1.0 | | | | | | | | | |--- lead_time > 179.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [4.00, 1.00] class: 0.0 | | | | |--- lead_time > 180.50 | | | | | |--- length_stay <= 3.50 | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | |--- lead_time <= 187.50 | | | | | | | | |--- arrival_month <= 4.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- arrival_month > 4.00 | | | | | | | | | |--- avg_price_per_room <= 78.30 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | | |--- avg_price_per_room > 78.30 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 20.00] class: 1.0 | | | | | | | |--- lead_time > 187.50 | | | | | | | | |--- lead_time <= 304.50 | | | | | | | | | |--- avg_price_per_room <= 78.90 | | | | | | | | | | |--- lead_time <= 237.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 237.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 78.90 | | | | | | | | | | |--- length_stay <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- length_stay > 1.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- lead_time > 304.50 | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1.0 | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | |--- weights: [11.00, 0.00] class: 0.0 | | | | | |--- length_stay > 3.50 | | | | | | |--- length_stay <= 13.50 | | | | | | | |--- no_of_special_requests_log <= 1.24 | | | | | | | | |--- avg_price_per_room <= 68.32 | | | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0.0 | | | | | | | | | |--- arrival_month > 11.00 | | | | | | | | | | |--- lead_time <= 247.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 247.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 68.32 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- avg_price_per_room <= 70.89 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | | | | | | |--- avg_price_per_room > 70.89 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- no_of_special_requests_log > 1.24 | | | | | | | | |--- weights: [17.00, 0.00] class: 0.0 | | | | | | |--- length_stay > 13.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1.0 | | | |--- market_segment_type_Offline > 0.50 | | | | |--- lead_time <= 368.00 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- length_stay <= 7.50 | | | | | | | | |--- lead_time <= 331.00 | | | | | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | | | | | |--- weights: [137.00, 0.00] class: 0.0 | | | | | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | | | | | |--- length_stay <= 5.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0.0 | | | | | | | | | | |--- length_stay > 5.50 | | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | |--- lead_time > 331.00 | | | | | | | | | |--- lead_time <= 336.50 | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0.0 | | | | | | | | | |--- lead_time > 336.50 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0.0 | | | | | | | |--- length_stay > 7.50 | | | | | | | | |--- avg_price_per_room <= 80.74 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | | | |--- avg_price_per_room > 80.74 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- lead_time <= 196.00 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | | |--- lead_time > 196.00 | | | | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1.0 | | | | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | |--- lead_time > 348.50 | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | |--- weights: [6.00, 2.00] class: 0.0 | | | | |--- lead_time > 368.00 | | | | | |--- lead_time <= 381.50 | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | |--- lead_time > 381.50 | | | | | | |--- weights: [1.00, 1.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests_log <= 1.24 | | | | |--- weights: [0.00, 2108.00] class: 1.0 | | | |--- no_of_special_requests_log > 1.24 | | | | |--- weights: [31.00, 0.00] class: 0.0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests_log <= 0.35 | | | | |--- weights: [47.00, 0.00] class: 0.0 | | | |--- no_of_special_requests_log > 0.35 | | | | |--- lead_time <= 289.50 | | | | | |--- no_of_special_requests_log <= 0.90 | | | | | | |--- avg_price_per_room <= 114.59 | | | | | | | |--- weights: [2.00, 0.00] class: 0.0 | | | | | | |--- avg_price_per_room > 114.59 | | | | | | | |--- weights: [0.00, 6.00] class: 1.0 | | | | | |--- no_of_special_requests_log > 0.90 | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | |--- avg_price_per_room <= 110.46 | | | | | | | | |--- lead_time <= 206.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0.0 | | | | | | | | |--- lead_time > 206.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | | | | |--- avg_price_per_room > 110.46 | | | | | | | | |--- weights: [7.00, 0.00] class: 0.0 | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1.0 | | | | |--- lead_time > 289.50 | | | | | |--- weights: [0.00, 7.00] class: 1.0
# checking out what variables are being prioritized by the model.
print (pd.DataFrame(dTree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp lead_time 0.397083 avg_price_per_room 0.207281 market_segment_type_Online 0.092752 arrival_month 0.084426 length_stay 0.073260 no_of_special_requests_log 0.068314 no_of_adults 0.029704 type_of_meal_plan_Not Selected 0.011106 room_type_reserved_Room_Type 4 0.008201 required_car_parking_space 0.007376 no_of_children_log 0.005896 type_of_meal_plan_Meal Plan 2 0.004562 market_segment_type_Offline 0.003515 room_type_reserved_Room_Type 2 0.002241 room_type_reserved_Room_Type 5 0.001711 room_type_reserved_Room_Type 6 0.000748 market_segment_type_Corporate 0.000693 repeated_guest 0.000471 room_type_reserved_Room_Type 7 0.000337 no_of_previous_cancellations_log 0.000323 room_type_reserved_Room_Type 3 0.000000 market_segment_type_Complementary 0.000000 no_of_previous_bookings_not_canceled_log 0.000000 type_of_meal_plan_Meal Plan 3 0.000000
importances = dTree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
# Pre prune the model with max depth hyperparameter
dTree1 = DecisionTreeClassifier(criterion = 'gini',max_depth=3,random_state=1)
dTree1.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=3, random_state=1)
# another confusion matrix
make_confusion_matrix(dTree1, y_test)
# The accuracy on the pre pruned tree.
print("Accuracy on training set : ",dTree1.score(X_train, y_train))
print("Accuracy on test set : ",dTree1.score(X_test, y_test))
# Check the recall with the get_recall_score user defined function
get_recall_score(dTree1)
Accuracy on training set : 0.7844202898550725 Accuracy on test set : 0.7913259211614444 Recall on training set : 0.7315556618438359 Recall on test set : 0.7385008517887564
# Let's see the pre pruned tree
plt.figure(figsize=(15,10))
tree.plot_tree(dTree1,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
print(tree.export_text(dTree1,feature_names=the_features,show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests_log <= 0.35 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [4614.00, 781.00] class: 0.0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [2504.00, 2768.00] class: 1.0 | |--- no_of_special_requests_log > 0.35 | | |--- no_of_special_requests_log <= 0.90 | | | |--- weights: [5624.00, 1055.00] class: 0.0 | | |--- no_of_special_requests_log > 0.90 | | | |--- weights: [2919.00, 145.00] class: 0.0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests_log <= 0.35 | | | |--- weights: [694.00, 1242.00] class: 1.0 | | |--- no_of_special_requests_log > 0.35 | | | |--- weights: [586.00, 249.00] class: 0.0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- weights: [31.00, 2108.00] class: 1.0 | | |--- arrival_month > 11.50 | | | |--- weights: [57.00, 15.00] class: 0.0
# Looking at the feature importances of this model
importances = dTree1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10,10))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {'max_depth': np.arange(1,10),
'min_samples_leaf': [1, 2, 5, 7, 10,15,20],
'max_leaf_nodes' : [2, 3, 5, 10],
'min_impurity_decrease': [0.001,0.01,0.1]
}
# scoring function used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=3, max_leaf_nodes=5,
min_impurity_decrease=0.001, random_state=1)
The estimator has a given some new parameters to run
-max_depth=3 -max_leaf_nodes_nodes=5 -min_impurity_decrease=.001 -random_state=1
# run the estimator in a confusion matrix
make_confusion_matrix(estimator,y_test)
# The accuracy on the estimator tree.
print("Accuracy on training set : ",estimator.score(X_train, y_train))
print("Accuracy on test set : ",estimator.score(X_test, y_test))
# Check the recall with the get_recall_score user defined function
get_recall_score(dTree1)
Accuracy on training set : 0.7694943289224953 Accuracy on test set : 0.7719378847744188 Recall on training set : 0.7315556618438359 Recall on test set : 0.7385008517887564
plt.figure(figsize=(15,10))
tree.plot_tree(estimator,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
-Still looking for Recall not accurancy so we loook at the DT Classifier
clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000e+00 | 0.009478 |
| 1 | 0.000000e+00 | 0.009478 |
| 2 | 0.000000e+00 | 0.009478 |
| 3 | 4.688391e-07 | 0.009478 |
| 4 | 5.329960e-07 | 0.009479 |
| ... | ... | ... |
| 1508 | 6.665684e-03 | 0.286897 |
| 1509 | 1.304480e-02 | 0.299942 |
| 1510 | 1.725993e-02 | 0.317202 |
| 1511 | 2.399048e-02 | 0.365183 |
| 1512 | 7.657789e-02 | 0.441761 |
1513 rows × 2 columns
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker='o', drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
#Decisiion Tree calssifier for every alpha
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train)
clfs.append(clf)
print("Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]))
Number of nodes in the last tree is: 1 with ccp_alpha: 0.0765778947737134
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1,figsize=(10,7))
ax[0].plot(ccp_alphas, node_counts, marker='o', drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker='o', drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(10,5))
ax.set_xlabel("alpha")
ax.set_ylabel("accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(ccp_alphas, train_scores, marker='o', label="train",
drawstyle="steps-post")
ax.plot(ccp_alphas, test_scores, marker='o', label="test",
drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(test_scores)
best_model = clfs[index_best_model]
print(best_model)
print('Training accuracy of best model: ',best_model.score(X_train, y_train))
print('Test accuracy of best model: ',best_model.score(X_test, y_test))
DecisionTreeClassifier(ccp_alpha=9.904212140385933e-05, random_state=1) Training accuracy of best model: 0.90544265910523 Test accuracy of best model: 0.8783423688321235
recall_train=[]
for clf in clfs:
pred_train3=clf.predict(X_train)
values_train=metrics.recall_score(y_train,pred_train3)
recall_train.append(values_train)
recall_test=[]
for clf in clfs:
pred_test3=clf.predict(X_test)
values_test=metrics.recall_score(y_test,pred_test3)
recall_test.append(values_test)
fig, ax = plt.subplots(figsize=(15,5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker='o', label="train",
drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker='o', label="test",
drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=2.450465588461807e-05, random_state=1)
#another confusion matrix
make_confusion_matrix(best_model,y_test)
# Recall on train and test
get_recall_score(best_model)
Recall on training set : 0.9790744947985173 Recall on test set : 0.7935831913685406
# plot the additional tree
plt.figure(figsize=(17,15))
tree.plot_tree(best_model,feature_names=the_features,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
# showing what metrics this model used
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [the_features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
comparison_frame = pd.DataFrame({'Model':['Initial decision tree model','Decision tree with restricted maximum depth','Decision treee with hyperparameter tuning',
'Decision tree with post-pruning'], 'Train_Recall':[.981,.732,.732,.979], 'Test_Recall':[.792,.739,.739,.794]})
comparison_frame
| Model | Train_Recall | Test_Recall | |
|---|---|---|---|
| 0 | Initial decision tree model | 0.981 | 0.792 |
| 1 | Decision tree with restricted maximum depth | 0.732 | 0.739 |
| 2 | Decision treee with hyperparameter tuning | 0.732 | 0.739 |
| 3 | Decision tree with post-pruning | 0.979 | 0.794 |
The three most important variables in terms of cancellations were the lead time, meaning how far in advance they booked the room(s), special request for the stay, and average price of the room. Rooms booked in advance of 151 days (5 months) or less were much less likely to cancel the reservation. Those who made a special request on top of that were very unlikely to cancel. This I believe is an opportunity. Rooms booked over 151 days were more likely to cancel. Price was the determining factor for those cancellations. As the likelihood of a cancelation was increased if the room was priced over 100.04 Euros. Leading me to believe that booked early and then subsequently found a better deal.